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FIELD AND BACKGROUND OF THE INVENTION 
20 The present invention relates to apparatus and method for improved efficiency 

of adaptations of finite element meshes for numerical solutions of partial differential 
equations. 

The finite-element method is the most effective of currently available 

numerical techniques for solving various problems arising from mathematical physics 
25 and engineering. In particular, it is the most widely used of numerical techniques for 

solving problems described by partial differential equations (PDEs). 

Time-dependent PDEs arise when modeling numerous phenomena in science 

and engineering, and tend to be divided into two categories: hyperbolic and parabolic. 

The hyperbolic PDE is used for transient and harmonic wave propagation in acoustics 
30 and electromagnetics, and for transverse motions of membranes. The basic prototype 

of the hyperbolic PDE is the family of wave equations. The parabolic PDE is used for 

unsteady heat transfer in solids, flow in porous media and diffusion problems. The 

basic prototype parabolic PDE is the family of heat equations. 
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The concept behind the finite-element method is to reduce a continuous 
physical problem with infinitely many unknown field values to a finite number of 
unknowns by discretizing the solution region into elements. Then, the values of the 
field at any point can be approximated by interpolation functions within every 
5 element in terms of the field values at specified points called nodes. Nodes are located 
at the element vertices where adjacent elements are connected. The approximation of 
the solution at each element should be consistent with neighboring elements. 

Several approaches can be used to transform the continuous physical 
formulation of the problem to its finite-element discrete analogue. For PDEs, the most 

10 popular method of their finite element formulation is that known as the Galerkin 
method as discussed in O. Axelsson and V. A. Barker. Finite Element Solution of 
Boundary Value Problems, Academic Press, Inc., London, 1984, the contents of 
which are hereby incorporated by reference. 

In time-dependent problems, e.g. hyperbolic equations, areas of interest, i.e. 

15 areas with high gradient or with turning points or rapid changes, are propagated 
through the domain, that is they tend to move over time across the solution domain. 
Therefore, the mesh choice is preferably dynamic and follows the propagation of the 
areas of interest with time. For example, when the solution of hyperbolic problems 
involves a shock wave propagating through the mesh the location of the shock 

20 vicinity keeps changing in time. Thus, one wants to have the mesh more refined 
around the area of the shock vicinity and less refined elsewhere. Another example is 
the problem of fluid flow in a cavity, where flow cells are generated and undergo 
continuous changes in their shapes and size as time proceeds. Thus, the mesh 
adaptation itself is a crucial part of the efficient computation of the numerical method. 

25 In order to achieve an optimal mesh, that is one in which the solution error is low 
relative to the number of nodes in the mesh, the mesh choice needs to be dynamic and 
must vary with time. 

Current systems use indicators such as gradients from the solution at a present 
stage to identify where the mesh should be modified, that is to say where it should be 

30 refined and where it can be made coarser, at the next time stage. However, such a 
system suffers from the obvious defect that it operates one step behind. In other 
words, if the areas of interest are propagated, then the current systems leave refining a 
step behind the most interesting phenomena. 
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There is thus a widely recognized need for, and it would be highly 
advantageous to have, a device and method for optimizing finite mesh based 
numerical solutions which is devoid of the above limitations. 

SUMMARY OF THE INVENTION 

According to one aspect of the present invention there is provided apparatus 
for calculating numerical solutions for partial differential equations in successive 
intervals using adaptive meshes, comprising: 

a neural network part for producing predictions of values of a parameter at a 
following interval based on values of said parameter available from previous 
intervals, and 

a mesh adaptation part, associated with said neural network part, configured 
for adapting a mesh over a domain of a respective partial differential equation using 
said predictions, such that said mesh adaptively refines itself about emerging regions 
of complexity as said partial differential equation progresses over said successive 
intervals. 

Preferably, the successive intervals are time intervals. 
Preferably, the parameter is a gradient. 

Preferably, said mesh adaptation part is further configured to adaptively 
coarsen said mesh about regions of low complexity. 

Preferably, said mesh adaptation part comprises a first thresholder for 
thresholding gradients from said neural network parts, such that gradients above said 
threshold value are taken to indicate complexity and to lead to local refining of said 
mesh. 

Preferably, said mesh adaptation part comprises a second thresholder for 
thresholding gradients from said neural network parts, such that gradients below said 
threshold value are taken to indicate complexity and to lead to local coarsening of said 
mesh. 

Preferably, said neural network part comprises two neural networks, each 
having an input layer of input elements, at least one hidden layer of hidden elements 
and an output layer of at least one output element, said two neural networks differing 
from each other in respective numbers of input elements. 
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Preferably, each hidden element defines a hyperbolic tan-sigmoid transfer 
function. 

Preferably, each output element defines a linear transfer function. 

Preferably, a first of said neural networks is a boundary element neural 
network for calculating gradients of boundary elements of said adaptive mesh, and a 
second of said neural networks is an interior element neural network for calculating 
gradients of interior elements of said adaptive mesh. 

Preferably, said boundary element neural network has fewer input elements 
than said interior element neural network. 

Preferably, said input elements are connected to gather for a given mesh 
element a gradient of said mesh element, and a gradient of each neighboring element 
for each of a cunent and a previous interval. 

Preferably, each hidden element defines a hyperbolic tan-sigmoid transfer 
function, each output element defines a linear transfer function, a first of said neural 
networks is a boundary element neural network for calculating gradients of boundary 
elements of said adaptive mesh, and a second of said neural networks is an interior 
element neural network for calculating gradients of interior elements of said adaptive 
mesh, and said boundary element neural network has fewer input elements than said 
interior element neural network. 

Preferably, said interior element neural network comprises eight input 
elements, six hidden elements and one output element. 

Preferably, said boundary element neural network comprises six input 
elements, six hidden elements and one output element 

Preferably, said neural network part is trainable by using initial random 
interval calculations of a respective partial differential equation. 

Preferably, said neural network part is configured for training using the 
Levenberg Marquardt training method. 

According to a second aspect of the present invention, there is provided a 
method of adapting a finite element mesh interactively with calculations of numerical 
solutions for a partial differential equation in successive intervals, said partial 
differential equation being calculated over a domain in accordance with respective 
finite elements of said mesh, each successive interval using a further adaptation of 
said mesh, the method comprising: 
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solves the PDE, uses the solution to build up a set of training data and uses the 
training data to train the neural network. The result is one or more neural networks 
optimized to predict areas of interest. Connected to follow the learning unit is a finite 
element mesh adaptation unit 14 which receives the partial differential equation itself 
5 as an input and computes local results based on a mesh. The adaptation unit uses the 
neural networks produced in learning unit 12 to identify areas of interest based on 
predictions and to carry out local refining and/or coarsening of the mesh, based on the 
predictions, so as to concentrate resources in the most important areas of the solution. 
At each element position for each time stage a solution including a gradient is found 

10 numerically to the PDE and the gradient is fed back to the neural network which 
produces an indication as to whether to refine (or coarsen) the mesh at that point for 
the next stage. Thus the mesh progresses in time to follow changing areas of interest, 
being adapted in stages using modifiers provided by the learning unit 12, As will be 
explained in detail below, the modifiers are forecasts of areas of interest in the 

15 solutions. 

Typically, learning is carried out using the same PDF as the adaptation. 
However, this may not always be possible or convenient. Learning may be carried 
out on one PDE and the resulting neural networks then used to apply predictions to a 
second PDE. In addition, learning may be carried out on one or more PDEs and the 
20 results applied to one or more other PDEs. Furthermore, if a solution is required for a 
given PDE over a time interval (0,T) then learning may be carried out over a first part 
of the interval (0, ti) and then use the predictions produced by the trained networks 
over the remainder of the interval (tj+dt, T). 

It is noted that in learning unit 12, no mesh modification is carried out. 
2:; However, it is possible to carry out learning whilst concomitantly modifying the 
mesh- 
Before explaining the operation of Fig. I in greater detail, a brief survey is 
made of time series neural networks. 



30 Background of Time Series Neural Networks 

Peed-Forward Neural Networks 

Neural networks (NNs) are a biologically inspired model, which tries to 
simulate the network of neurons, or the nervous systems, in the human brain. The 



of the invention in more detail than is necessary for a fundamental understanding of 
the invention, the description taken with the drawings making apparent to those 
skilled in the art how the several forms of the invention may be embodied in practice. 
In the drawings: 

FIG. 1 is a simplified block diagram illustrating components of a 
calculation system according to a first embodiment of the present invention; 

FIG. 2 is a simplified diagram illustrating a three layer neural network 

structure; 

FIG. 3 is a simplified diagram showing in greater detail the calculation 
system of Fig. 1 during the course of two processing stages involving two respective 
adaptation stages of a finite element mesh; 

FIG. 4 is a simplified flow diagram showing the operation of a 
preferred embodiment of the present invention according to a preferred embodiment 
of the present invention; 

FIG. 5 is a graph illustrating the results of a typical prediction test for 
interior and boundary elements to test time series prediction for the ID Wave 
Equation; 

FIG. 6 illustrates the analytic solution for a ID wave equation used in 

the example; 

FIG. 7 is a comparative graph showing two successive stages of NN 
and standard mesh adaptation shown side by side; 

FIGs. 8A and 8B are graphs showing mean square and infinite error 
norms for each of the NN and standard adaptation techniques; 

FIGs. 9 is a graph illustrating the analytic solution for another wave 
equation used in an example; 

FIGs. 10-13 illustrate experimental results for the wave equation of 
Fig. 9 using a standard method, a neural network method using refining only and a 
neural network method using both refining and coarsening; and 

FIGs. 14-16 illustrate results achieved using a two-dimensional wave 

equation. 



DESCRIPTION OF THE PREFERRED EMBODIMENTS 



Basic learning algorithms and the neural network model are applied to the 
problem of mesh adaptation for the finite-element method for solving time-dependent 
partial differential equations. Time series prediction via the neural network 
methodology is used to predict areas of interest in order to obtain an effective mesh 
refinement at the appropriate times. The ability to concentrate resources effectively on 
such areas of interest allows for increased numerical accuracy with the same 
computational resources as compared with more traditional methods. 

The present embodiments provide an improved approach to solving the mesh 
adaptation problem. The approach looks at mesh adaptation as a special instance of a 
control problem and uses the neural network to solve it in a similar way to that in 
which such networks have been used to predict time series. 

The neural network is a universal approximator that learns from the past to 
predict future values. It receives, in some form, as input certain areas of interest of a 
present stage, and predicts areas of interest for the next stage. Using the predictor the 
present embodiments forecast the position of the action and refine the mesh 
accordingly. 

The embodiments and supporting experiments described herein are for one 
and two-dimensional hyperbolic equations, that is to say wave equations, but are 
easily extendible to higher dimensional problems and other time-dependent PDEs. 

The principles and operation of a forecasting system for areas of 
interest according to the present invention may be better understood with reference to 
the drawings and accompanying description. 

Before explaining at least one embodiment of the invention in detail, it 
is to be understood that the invention is not limited in its application to the details of 
construction and the arrangement of the components set forth in the following 
description or illustrated in the drawings. The invention is capable of other 
embodiments or of being practiced or carried out in various ways. Also, it is to be 
understood that the phraseology and terminology employed herein is for the purpose 
of description and should not be regarded as limiting. 

Reference is now made to Fig. 1, which is a simplified block diagram 
showing a general scheme for a preferred embodiment of the present invention. In 
Fig. 1, a numerical analysis device 10 for providing numerical solutions to partial 
differential equations and the like comprises a neural network learning unit 12 which 
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solves the PDE, uses the solution to build up a set of training data and uses the 
training data to train the neural network. The result is one or more neural networks 
optimized to predict areas of interest. Connected to follow the learning unit is a finite 
element mesh adaptation unit 14 which receives the partial differential equation itself 
as an input and computes local results based on a mesh. The adaptation unit uses the 
neural networks produced in learning unit 1 2 to identify areas of interest based on 
predictions and to carry out local refining and/or coarsening of the mesh, based on the 
predictions, so as to concentrate resources in the most important areas of the solution. 
At each element position for each time stage a solution including a gradient is found 
numerically to the PDE and the gradient is fed back to the neural network which 
produces an indication as to whether to refine (or coarsen) the mesh at that point for 
the next stage. Thus the mesh progresses in time to follow changing areas of interest, 
being adapted in stages using modifiers provided by the learning unit 12. As will be 
explained in detail below, the modifiers are forecasts of areas of interest in the 
solutions. 

Typically, learning is carried out using the same PDE as the adaptation. 
However, this may not always be possible or convenient. Learning may be carried 
out on one PDE and the resulting neural networks then used to apply predictions to a 
second PDEIn addition, learning may be carried out one or more PDEs and the results 
applied to one or more other PDEs. Furthermore, if a solution is required for a given 
PDE over a time interval (0,T) then learning may be carried out over a first part of the 
interval (0, ti) and then use the predictions produced by the trained networks over the 
remainder of the interval (ti+dt, T). 

It is noted that in learning unit 12, no mesh modification is carried out. 
However, it is possible to carry out learning whilst concomitantly modifying the 
mesh. 

Before explaining the operation of Fig. 1 in greater detail, a brief survey is 
made of time series neural networks. 

Background of Time Series Neural Networks 

Feed-Forward Neural Networks 

Neural networks (NNs) are a biologically inspired model, which tries to 
simulate the network of neurons, or the nervous systems, in the human brain. The 
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artificial neural networks consist of simple calculation elements, called neurons, and 
weighted connections between them, A neural network can be trained to perform 
complex functions by adjusting the values of the connections (weights) between the 
elements (neurons) according to one of several algorithms. For the purposes of the 
5 present embodiments, the neural network may be considered simply as a data 
processing technique that maps, or relates, any kind of stream information to an 
output stream of data. 

The most common NN model is the supervised-learning, feed-forward 
network, 

10 Reference is now made to Fig. 2, which is a simplified schematic diagram 

showing the configuration of a single hidden layer feed-forward network with n input 
units, p hidden units and m output units. 

As shown in Fig. 2, the typical feed-forward network contains three types of 
processing units (neurons), input units, output units and hidden units, organized in a 

is layered hierarchy as follows: an input layer, hidden layers and an output layer. The 
data from the input layer axe multiplied by the weights associated with a next-layer 
unit and summed together before processing by the unit in the next layer. Likewise 
the data from the hidden layer are summed together by weights associated with an 
output unit prior to processing by the output layer unit 

20 There are a number of training or learning algorithms. Of particular interest is 

the algorithm known as back-propagation. Back propagation consist of two phases: 
feed-forward propagation and backward propagation. In feed-forward propagation, 
the input units send the input signals forward through the netwoxk to produce an 
output. Then, the difference between the actual and desired outputs produces error 

25 signals which are sent backwards through the network to modify the weights between 
neurons. Modifying the weights between two connected layers is achieved by using a 
minimization procedure. The forward and backward propagation operations are 
executed itcratively over the training set until convergence occurs. Convergence is 
the state when the average squared error between the network outputs and the desired 

:;0 outputs reaches an acceptable value. 

The mathematical expressions governing the neural network of Fig. 2 may be 
written as, where /is the activation function of the processing units; 
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Z; = /(Vo, + Ex,V,y) (1) 
y k = f(Wo k + Y t ZjW jk ) (2) 

^ = 0.5E(/-j; A ) 2 (3) 

Training Algorithm 

Back-propagation is the most popular training algorithm, and in back 
propagation the training data propagated through the network and the output data are 
calculated as in equations 1 and 2. The error between the expected output and the 
calculated output is computed as in equation 3. Then a minimization procedure is used 
to adjust the weights between two connection layers which proceed backwards from 
the output layer to the input layer. There are a number of variations of minimization 
procedures that are based on different optimization methods, and these include the 
gradient descent, Quasi-Newton and Levenberg-Marquardt methods. Each of these is 
now briefly considered. 



Gradient Descent Method 
The simplest implementation of back-propagation training uses a method in 
which the network weights are moved along the negative of the gradient of the error 
function. The gradient descent method is a first order learning algorithm, meaning that 
it only uses information about the first order derivative when it minimizes the error. 
The major drawback of the gradient descent method is that it requires a large number 
of steps before converging on a solution. One iteration of the gradient descent 
algorithm can be written as: 

dE 

w jk (new) = Wjk (old) - a- (4) 

d w Jk 

ViJ (neW) = ViJ (old) - a~ (5) 

Where a is the learning rate. Let W = { V(/ ,*v *}be the vector of all the connection 
weights in the network. Then we can write equation 4 and equation 5 as: 



W M = W,-a f V w ,E 



(6) 



11 



Quasi-Newton Method 
The Quasi-Newton optimization method is a second order learning algorithm. 
The Quasi-Newton method provides an alternative and more efficient way of 
5 minimizing the sum of squares error E given in equation 3. The Quasi-Newton 
method is one of the fastest learning methods and provides a substantial improvement 
in the convergence rate over the gradient descent method. One iteration of quasi- 
Newton algorithm can be written as: 

10 Wi^W-H' x V w E (7) 

where is the inverse of the Hessian matrix (second derivatives). The 
elements of the Hessian matrix take the form: 

15 Hij = - ^,5 , Where w , Wj eW. (8) 

The main drawback of the Quasi-Newton method is that it requires an 
expensive and complex computation in order to calculate the Hessian matrix in each 
iteration. 

20 

Levenberg-Marquardt Method 
The Levenberg-Marquardt method combines the gradient descent methods 
with the Quasi-Newton method. One iteration of the Levenberg Marquardt algorithm 
can be written as: 

25 

W M = W~ (J T J+JLlI)- X W, E (9) 

Where J is the Jacobian matrix of error derivatives with respect to weights, // 
is a scalar control parameter an / is the identity matrix. The computing of the 
30 Jacobian matrix is much less complex than computing the Hessian matrix; the Hessian 
matrix can be approximated as H = J T J . 
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Time Series Neural Networks 
Generally speaking, the prediction of a future gradient from a series of past 
values in a gradually changing PDE solution domain U very similar to solving time 
5 series. More particularly, time series is well suited for data where past values in the 
series may influence future values. In time series, a future value is a nonlinear 
function of the past m values: 

*(») - KM* - 1), x(n -2), , *(«- «)) (10) 

10 

Using a time series therefore means that it is necessary to fit a function x 
through its past values in order to extrapolate the function to the near future. 

Now a feed-forward network can approximate any function after a suitable 
amount of training. Thus such a feed-forward network can be applied to the function 
«5 fitting problem by submitting discrete values of the function to the network. The 
network is then expected to learn the function rule using the training algorithm. The 
behavior of the network is changed during the training algorithm by modifying the 
values of the weights. 

Thus it is possible to use the back-propagation network as a nonlinear model 
20 that can be trained to map past and future values of a time series. Such a method, 
when used with neural networks, is called time series prediction and is used in the 
forecasting of financial markets for example to predict whether stock market rates will 
rise or fall. 

25 Applying NNs to Time-dependent PDEs 

PDEs tend to describe a continuum in which some areas are subject to 
considerable change in gradient whereas large areas behave relatively smoothly. A 
successful numerical method therefore requires to identify as early as possible the 
areas of rapid change, hereinafter critical areas or areas of interest, so as to 
3<> concentrate thereon. When using finite element mesh (FEM) based methods in order 
to solve PDEs numerically, critical regions are typically made subject to local mesh 
refinement. The critical regions are defined as those regions for which the local 
gradient shows larger changes. In order to accommodate the critical regions, the FEM 
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adaptation process makes a local refinement in those areas, leaving the remaining 
mesh more coarse in the other areas. In time-dependent problems, the mesh 
refinement is preferably dynamic and depends on the error estimation at each time 
stage. 

5 In the current art, most of the error estimate methods take into account a 

solution gradient. However, the present embodiments improve on the current art by 
being based on predicting the future gradient value of the solution. It is this predicted 
future value of the gradient which is used as the refinement criteria for the finite 
element mesh. In alternative embodiments of the invention, instead of the gradient, 

10 the second derivative is used. The second derivative is often a better prediction of the 
presence of complexity than the gradient. In further alternative embodiments the 
energy of the solution is used as a prediction of complexity. The energy is related to 
the gradient but is not identical thereto. 

In dynamic systems, e.g. hyperbolic equations, the areas of interest, i.e. the 

15 areas with high gradient, tend to propagated through the domain over the course of 
time. Therefore, for each mesh element the future gradient value is influenced by the 
past gradient values of the element and of its direct neighbors. In other words, the 
future gradient value can be considered as a nonlinear function of its past values. 
Predicting the future gradient from passed gradients in such circumstances is thus a 

20 true time series problem and thus time series neural networks are ideal tools for 
predicting the future gradient values. Reference is now made to Fig. '3 which 
illustrates the concept of using predictions of future gradient values computed using a 
time series technique to locate the critical regions for refinement of the mesh. 

Fig. 3 shows a neural network 30 of the kind shown in Fig. 2, an upper mesh 

25 domain 32 and a lower mesh domain 34. The upper domain 32 represents the mesh at 
time f„_, and the lower domain 34 shows the mesh at time t n . At time /„ it is apparent 
that some elements are refined. The structure of a typical neural network 30 for the 
present purpose is now considered. The neural network 30 uses eight input units 36. 
The input elements take from the mesh the value of the gradient of the element and its 

30 two neighbors in the current 34 and previous 32 iterations. Hidden units 38 preferably 
use a hyperbolic tan-sigmoid transfer function. A single output unit 40 has a linear 
transfer function. The output of the output unit 40 is preferably the gradient 
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prediction in the region of the given mesh element, and is used to generate a new 
version of the mesh, that is to say a mesh for t n +i. 

The neural network 30 receives as its input the gradient of element e and its 
neighbors el; e2 and e3 at times *„_,and /„, and as output it provides a predicted 
gradient value of element e at time t n+] . The above assumes of course that the network 
is already trained, and in practice the network has to be trained on each individual 
PDE. Thus there are two steps in the methodology of the present embodiments as 
follows: 

(a) training the neural network to predict the indicators, the indicators being 
the gradients, at least one step in advance, and 

(b) obtaining outputs and using them in the trained neural net to obtain 
indicators for refinement (and/or coarsening) of the mesh in the FEM solution. 

Reference is now made to Fig. 4, which is a simplified diagram illustrating the 
procedural flow for providing a numerical solution to partial differential equations 
according to a preferred embodiment of the present invention. In Fig. 4, the 
procedural flow is divided into two parts, a neural network (NN) training phase and a 
finite element mesh (FEM) adaptation phase. In a stage SI, input data is received 
which typically includes the following: 

(i) PDE Parameters, such as wave speed, 

(ii) Time interval of interest (0, T) and time step, (dt), 

(iii) Training threshold, 

(iv) Number of FEM elements on the initial mesh, 

(v) Refinement threshold, and 

(vi) Number of examples in the training set. 

An FEM mesh is then constructed in stage S2 over the domain of the PDE. 

Stage S3 comprises calculating the solution to the PDE and corresponding 
gradients on the initial mesh, that is the mesh defined non-dynamically in stage S2. 
The calculations are made for all of the relevant time interval (0, T) at steps of dt. 

In more detail S3 begins by setting a time variable t n to zero. Then for as long 
as t n is less than a maximum time T, t n is incremented by dt, and for each increment 
the PDE solution and gradient are calculated. 

The next stage, S4, involves creating and initializing the neural networks. 
Two neural networks are constructed, one for interior elements and one for boundary 
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elements. In a preferred embodiment, the neural network for interior elements 
comprises eight input units, six hidden units and one output unit. For boundary 
elements there are six input units, six hidden units and one output unit. The input 
units take gradient values of the mesh element and its neighbors from a current and a 
5 previous iteration. The hidden elements apply a hyperbolic tan-sigmoid transfer 
function and the output unit applies a linear transfer function to give the time series 
prediction as an overall output. 

Stage S5 involves making a selection of random time stages over the time 
interval of interest and constructing a training set for the FEM elements at each stage. 
10 Constructing the training set is carried out for both of the neural networks constructed 
in S4, that is both for the interior element and the boundary element networks. Each 
training set comprises an input and an output, the input being the gradient values of a 
FEM element e and its neighbors at time stages t n and t n _i. The output is the gradient 
of element e at t n+ j. 

15 In stage S6, the neural networks are trained using the training sets. In a 

preferred embodiment, the networks are trained using the Levenberg-Marquardt 
training method. 

The procedure then moves on to the Mesh adaptation phase with stage S7 in 
which the time variable t n is set to dt, that is to say the first step inside the interval of 

20 interest (0,T). In step S8, the PDE is solved for time t n to obtain gradient values over 
the current mesh. The results of step S8 are used as inputs to the neural networks and 
predictions are produced of the gradients for the next step. In step S9 the mesh is 
adapted using predictions from the neural network. The mesh is refined where the 
gradient is above a threshold and otherwise left alone in one embodiment. In another 

25 embodiment the mesh is also coarsened where the gradient is below a second 
threshold. 

In stage S10 the time interval is moved forward by dt and a test stage Sll 
returns the procedure to S8 until the end of the time interval is met. The final output 
at S 12 is the PDE solution at each step dt over the interval of interest. The mesh has 
30 been continually adapting over the solution to concentrate on areas of interest as they 
emerge over the time interval. Thus if a region of interest is in one corner early in the 
time interval, and then disappears, and a second region of interest appears later on in 
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another comer, the procedure succeeds in concentrating resources on the first region 
early on and to transfer the resources rapidly to the second region as it emerges. 

Additional objects, advantages, and novel features of the present invention 
will become apparent to one ordinarily skilled in the art upon examination of the 
following experiments and comparative examples, which are not intended to be 
limiting. 



Experiments 

Reference is now made to the following Experiments which, together with the 
above description, illustrate the invention in a non limiting fashion. 

In the following experiments we examine the numerical results of applying 
this procedure using the FEM on different time-dependent PDE problems using 
different parameters for the NN algorithm and comparing this with (i) FEM with no 
adaptation and (ii) FEM using the "standard" adaptation via the current gradient 
indicator. 

Measures of Solution Quality 

To measure the quality of the FEM solution, we calculate both of 
the L 2 and /Terror norm per value in each time stage. (Here u is the analytic solution 
and u h is the numerically computed solution). 



e error /value= ^K^> ~ ^ode)^ (J {) 

Z„o<,esH node )\ 

r error I value = ^^ ( X 2) 

m ax H ^|«(/io^)| 



The terror norm measures the error over the entire solution space, the 
average error, and the IT measures the maximum error occurring in the solution. 

Use of an analytic solution is important in order to measure the precise error in 
the solution. When the analytic solution of a PDE is not available, we do one 
calculation with a very small time step and a very fine constant mesh, and then we use 
this solution as a reference to the analytic one. In all cases, we report at the end of 
each experiment the average of Z*and U° error per value over all the time space. - 
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Neural Network Architecture and Training 

MATLAB's Neural Network Toolbox was used for designing and training the 
neural network; and the MATLAB's Partial Differential Equation Toolbox was used 
5 for defining, and solving the two dimensional PDEs problems. For one dimensional 
problems, we used a FEM solver which we developed especially for our research 
needs. In the examples tested so far the results are fairly dramatic. First, using the 
Levenberg-Marquardt training algorithm the training was both quite swift and 
exceptionally accurate. 

10 Second, the improvement in the FEM numerical results, as compared with the 

standard gradient adaptive method, was as high as 25% in some examples; and never 
fell significantly below the standard method. The variance in the improvement 
depends on the shape of the wave; and is to be expected. The variance simply 
expresses the fact that for some waves it is more important to predict the gradient than 

15 others. 

The experiments use two different networks, one for boundary elements and 
one for interior elements. It is noted that embodiments using a single network or more 
than two networks are also possible. In particular it may be desirable to use two 
networks for different interior elements. The architecture of the networks as used in 

20 the experiments uses six input units for boundary elements networks, and eight input 
units for interior element networks. The input elements thus correspond to the value 
of the gradient of the element and its two neighbors in the current and previous 
iterations. For both the networks, six hidden units are used, as opposed to the version 
shown in Fig. 3, where only three hidden units are used. The hidden units preferably 

25 use a hyperbolic tan-sigmoid transfer function. Both networks preferably use a single 
output unit having a linear transfer function. The output of the output unit is 
preferably the gradient prediction in the region of the given mesh element. The 
skilled person will appreciate that other numbers and combinations of input, hidden 
and output units are possible. Also the training algorithms and the active functions in 

30 the various neurons can be varied. 

In order to make the training more efficient: (a) we normalize the input and 
output data between the values 0 and 1; and (b) we divide training data into two 
disjoint subsets: a training set and a testing set. The training set is used for computing 



the gradient and updating the network weights and biases. Testing on the validation 
set is monitored during the training process; as long as the error decreases, training 
continues. When the error begins to increase, the net begins to overfit the data and 
loses its ability to generalize; at this point the training is stopped. The skilled person 

5 will be aware that the above strategy may be varied. 

To generate training data: (a) we calculate the solution on the initial non- 
dynamic mesh over all the given time space; (b) we select a collection of random time 
stages, and build training examples for all the elements in the time stages selected. 
The training data, consists of more than 800 examples (about 600 for training set and 

10 200 for testing set). 

One Dimensional Wave Equations 
Mesh Refining 

In experiments, the NN modifier was run over a variety of initial conditions 
15 for the one dimensional wave equation. In all cases, the NN predictor was extremely 
accurate. 

Reference is now made to Fig. 5, which is a graph showing the results of a 
typical prediction test for interior -below- and boundary -above- elements to test time 
series prediction for the ID Wave Equation. Training took about 117 time stages to 

20 converge to an extremely small error (about 0.00024) in the interior element 
prediction. In Fig. 4, for each of the boundary and interior cases (x) indicates test 
values; and (o) indicates the network response or prediction. The graph shows 
randomly selected test cases for various times. 

Comparing the upper to the lower parts of Fig. 5 shows that results for the 

25 boundary elements were similar to those of the interior elements. When applying the 
modifier to the FEM mesh, the numerical improvement over the "standard" gradient 
modifier varied from no significant improvement to an improvement of more than 
25%, with respect to both the L 2 error norm and the IT error norm. 

In order to consider the various initial conditions used for the ID wave 

30 equation, particular reference is now made to Example 2 in Table 1 below, in which 
the initial condition of the wave is a Gaussian. The analytical solution is well known 
for these types of problems, and reference is now made to Fig. 6, which illustrates 
such an analytic solution. The solution depends on the initial and boundary 
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conditions. The wave splits into two waves, with the same width but half the height, 
that travels to the left and to the right with speed c = l . When such a traveling wave 
reaches the edge it turns over and returns upside down. Two successive stages of the 
NN modified solution and the "standard" gradient modifier are displayed side by side 
5 in Fig. 7 along a line indicating the analytic solution. Dots along the line in each case 
indicate the density of mesh elements. A particular area of interest involving a sharp 
turning point is indicated by a marker rectangle. Observing the areas indicated in the 
figure, one can see that the NN has chosen to place its resources in the correct places. 
Looking at the refinement markings, shown as dots on the x- axis, one can see that, as 

10 suggested, the NN keeps pace with the development of the solution, whereas the prior 
art method is always one-step behind. At critical locations the standard method thus 
suffers from increased numerical error. 

Since the one-dimensional examples have analytic solutions, it is possible to 
keep track of the actual numerical errors of each of the methods. Reference is now 

15 made to Figs. 8 A and 8B, which show tracking of the errors respectively using the/, 2 
error norm and the E° error norm. In Figs. 8 A and 8B, the neural net predictions are 
indicated by X and the standard method predictions by O. 

The wave of the analytical solution reaches the edge at time t = 15 (see Fig. 5 
above) and it starts to turn over and returns upside down. Such a phenomenon is 

20 clearly shown in Fig. 8, in which the errors, both in the L 2 error norm and in the 
£° error norm, begin to rise at time / = l5and keep changing until the wave starts to 
return upside down (at r = 25). During this critical time interval, the method of the 
present embodiments shows a distinct advantage over the standard method. 
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Table I: Qm diuieasion examples. Comparison between FEMk rim with p) The 
neural network predictor of the gradient ineasiire, (ii) "Stamford" refinements 
iLsiug t he gradient, measure, (in) No adaptation. 



Example I 

0 <tf»«£ tr{HU)~0 



f* t*,0)~Q arid 



» *|, !<:*-< 2 

otherwise 

Number of Initial Elemental*}, Tune:12 ; Time $tep:0-0H 
Threshold for refinement = 0.QB (gradient) 



■(;■ 



I? Error Norm JL^' Error Norm 
Method N%nber of Refined Elements Max Average Max Average 



NN Modifier 70 
Standard Modifier 70 
No Adapt atkm 0 



o.ia?50 aoaay O-iMs nuws 

0.1826 0-1022 0.1914 0.1222 
2.5332 LCJ053 3.470$ 1.0(14$ 



Improvement: J? error norm a 15% r L?° ew«r mm - 13.0% 



f <*•«» 



Example 2 



0 < * < 25 



v 0 «(25 ? 0-0 



0, 



0 < * < 10 
other-wine 



Number of Inifei Elements:!^ Time:2C». Time Sfep:0.12 



Threshold for refinement » 


0.2 (gradient) 






L 2 Error Norm 


L 2 * Error Norm 


Method Number of Refined Btemente 


Max 


Average 


Max Average 


NN Modifier OA 
Stmidard Modifier 1*1 
No Ad apt anon U 


0.4423 
1.4622 


(hum 


mm 0.2342 
O.H230 0.3 M2 
1.0288 l».Cf456 


Iniprwjnitmt: L 2 error uorm ^ 2$% : 


£Oi2 |?rrr>r l^ipm -sr 


25% 



Mesh Coarsening 

5 The NN predictions provide modifiers which can be used both for refining and 

coarsening the FEM mesh at the same time. Thus, when the gradient of an element is 
bigger than a given refinement threshold the system decides to refine the mesh, and 
when it is less than a given coarseness threshold the system decides to coarsen the 
mesh. In the following, we present two examples both using the same wave equation. 
10 The examples used are Example 3 and Example 4 in Table 2 below. In Example 3, we 
use the NN modifiers only for refining the FEM mesh and in Example 4 we the NN 
modifiers both for refining and coarsening the FEM mesh at the same time. 
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The results are shown in figures 9- 1 3 to which reference is now made. Fig. 9 
presents the analytic solution of the given wave equation, Figs. 10A and 10B present 
the prediction graph of the NN modifier, in which Fig. 10A shows gradient prediction 
for boundary elements and Fig. 10b shows the same for interior elements. The figure 
5 illustrates time series prediction tests for Example 3 and Example 4 respectively. Each 
prediction on the graph shows the network prediction (o) and the observed actual 
result (x) for randomly selected test cases for various times. 

Fig. 1 1 shows two successive stages of the mesh development for the neural 
network and standard methods respectively for Example 4. That is to say the left hand 
10 side shows FEM mesh modification involving both refining and coarsening in 
accordance with the NN modifiers. Also indicated is the analytic solution. The right 
hand side shows results modified with the standard gradient indicator. A comparison 
of the critical segments of the curves on the left, the enclosed rectangles with the 
corresponding segments on the right clearly indicates how the NN predictor is 
15 successful at focusing the resources earlier on in the correct places. 

Figs. 12A and 12B present the square and infinite error norms respectively for 
Example 3 against the standard method and finally Figs. 13A and 13B show the same 
for Example 4. 

In both examples the results show that the NN modifier results are more 
20 effective than the standard gradient modifier. In Example 3, an improvement of 8.7% 
in the L 2 error norm and 5.4% in the r° error norm are achieved (see Table 2). The 
same improvement is apparent by comparing the error norm graphs in Fig. 12A and 
12B. 

Comparing the left hand, NN enhancement, and right hand, standard method, 
25 graphs in Fig. 1 1, it is apparent that the NN modifier has placed the resources in the 
correct places and this explains the enhancement in the results. 

A comparison between Figures 12 and 13 illustrates the benefit of using the 
NN modifier for coarsening as well as for refining. When using the NN modifier to 
coarsen the mesh, as in Example 4, the improvement rises from 8.7% as measured by 
30 the L 2 error norm to 9. 1 % and from 5.4% to 8.4% when measured by the IT error norm 
(see Table 2 and Fig. 13). 
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Tiibta 2; FEM mesh refining and eoarseitmg fcx*imj>k*i 4 



Example 3 

i^O s t)«fl wn<f u(12.t)»() 

f exnrT^mfeFn. 1 < :v < 2 



otherwise 

Niitiibor of Initial EUwmizM, Tinm:2U f Time Stopftftl 



Threshold for refhwiuifcnt ~ 


* 2 (gradient) 






£ 2 Error Norm 


L 7 ^ Error Norm 


Method Number of Re&ned Elements 


Mux Average 


Max Average* 


NN Modifier »j 
Standard Modifier 90 


0.4140 0.2678 

i)Am 0.2935 


0.5007 0.302S 
0.581)1 0.3201 


Improvement: I? error norm «r $.7%, 


I** ttrror norm — 


5.4% 



Example 4 

= 0, 0<*<12 
«(0.t)»0 «w/ 1*112, t)»0 



J expfajtemfR-a), 1 < a; < 2 



Number of Initial EJem*nte:ft0, Time:20 r Time Siep:O.09 
Threshold for r<*fin*>iii^i:it = 2 (gradient), Threshold for f*«*r$eite$$ = 0.3 

L 2 Error Norm £ ~^ Error Norm 
Method Number of Refilled Element* M&x Average &lax Average, 

NN Modifier 90 0.7155 0.3710 O.S<>4<> 0.347$ 

Standard Modifier 73 0.TOH 0.4085 0.73:11) 0.371H) 

Improvement: L 2 error norm ^ L* 0 error norm ™ $A% 



Two Dimensional Wave Equations 

In a further set of experiments, the method was applied to a variety of two- 
5 dimensional wave equations. In all cases, the NN modifier showed a clear 
improvement although there was considerable variance in the extent of the 
improvement. Over the set of experiments carried out, as a whole the rate of 
improvement varied between around 2% and around 20%. The variance in the 
improvement depends on the initial conditions, for example the shape of the wave and 
10 its velocity. In the following we present just two examples, and the details are given 
in Table 3. The results are shown in Figs. 14-16, to which reference is now made. 
Fig. 14 shows a time series prediction test for the 2D wave equation. In the graph, (x) 
indicates test values; and (o) indicates network response. Each prediction on the graph 
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shows the network prediction against the observed actual result for randomly selected 
test cases at various times. 

Fig. 15 shows FEM results using the two-dimensional wave equation and 
Example 1 of Table 3 below. The left hand side illustrates refinement with the NN 
5 predictor. The right figures are refined using the standard gradient indicator. 

Figs. 16A and 16B show square and infinite error norms respectively for 
Example 1 using the 2D wave equation. NN results are indicated by x, and standard 
results are indicated by o. 

From the results it is apparent that: 
10 1. the prediction of the gradient was very accurate, (see Fig. 14, and Fig. 

15); and 

2. the improvement in the FEM numerical results was around 10% over 
the standard gradient methods (see Table 3 and Fig. 16); From Fig. 15 it is 
clear the the NN method has chosen to place its resources in the correct places. 
15 Looking at the critical regions, indicated by circles 150, 150% it is apparent 

that the standard method trails behind the NN method. 
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Table Two Hiuii:*iisiou *ixiuitpk?f>> Comparison bctrrccen FEMk rim wirl* (J) Tile 
neural ne^xirk predictor of tlie gradient measure, (ii) "Standard" wfiiiemejita 
using the gradient measure. 

Example 1 

"(-1,0*0 W = /or — 1 < ,V < 1 

-.1,0 « 0 at?// - 1, i) « 0 /or - 1 < :r < 1 

#» , A . * , , ,« / 15;, -( ;r f l My * 0= -i < < 0 . -J < y < 0 

{;e.jf,0) = O a/Ml u(«.y,»)«< 

Number of Initial Elements:^ Time:3 ? Time Step:0.05 
Tliri^hold for reiiiwmtimt m 1 (gradient) 



Metltod Number of Refined Elements 


Average £ a Error 


Average /. oc Error 


NN Modifier 


0.4057 


0.4846 


Standard Modifier $0.1 


0.4314 




Improvement: L 2 error norm •** 


L !3C ermr norm » 





Example 2 

#f = fe# + 5 * ^ : i . -i < if < i 

u(-l.ftf) « 0 (iml w(E }/,/} ~ 0 /or - 1 < ;j < 1 
w(ar,-l.t) =4 0 am/ -.1. 0 » 0 /or - 1 < a? < 1 
7§T ( : * - & 9j - ** ) e*p< is- ) ) ?*(-'\ ?y , 0) ~ nrt:frin( &w{ £ ) ) 

Number of Initial Etenif9»ts:2$. Time:3 : Time Sfep:0.08 





Tltr<=Tsht>!<I for remicuieat ^ 


i 2.2 (gradient) 




Mi&hftd 


Number of Refined Elements 


A vtirfigft X 2 Error 


Average Error 


NN Modifier 


2m 


0.21)62 




Standard Modifier 


232 


0^256 





Iiii|>i-<»veuM!Ui: L 2 error norm •«» 9%. JC 30 crr<*r norm « 1 1% 

Summary 

We have implemented a version of a NN modifier for the FEM mesh; 
designed to adaptively change the mesh based on a prediction of the gradient. In 
experimental work, we have shown that the NN modifier method can accurately 
predict the gradient, and applying modification in this way results in a substantial 
numerical improvement. 

It is appreciated that certain features of the invention, which are, for 
clarity, described in the context of separate embodiments, may also be provided in 
combination in a single embodiment. Conversely, various features of the invention, 
which are, for brevity, described in the context of a single embodiment, may also be 
provided separately or in any suitable subcombination. 

Although the invention has been described in conjunction with specific 
embodiments thereof, it is evident that many alternatives, modifications and variations 
will be apparent to those skilled in the art. Accordingly, it is intended to embrace all 
such alternatives, modifications and variations that fall within the spirit and broad 
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scope of the appended claims. AH publications, patents and patent applications 
mentioned in this specification are herein incorporated in their entirety by reference 
into the specification, to the same extent as if each individual publication, patent or 
patent application was specifically and individually indicated to be incorporated 
herein by reference. In addition, citation or identification of any reference in this 
application shall not be construed as an admission that such reference is available as 
prior art to the present invention. 



